-
Notifications
You must be signed in to change notification settings - Fork 26
Fix postgres op_id sequence intialization #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 90b55b8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue in Postgres storage where the op_id_sequence may be incorrectly initialized, leading to op_id conflicts during checkpoint creation. Key changes include:
- Adding a new test to cover the op_id initialization edge case.
- Refactoring PostgresBucketBatch to use a centralized getLastOpIdSequence() method.
- Updating sync rules storage to check for a non-null last_flushed_op.
- A changeset documenting the patch for Postgres storage.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/service-core-tests/src/tests/register-data-storage-tests.ts | Added a test case for op_id initialization edge case. |
| modules/module-postgres-storage/src/storage/batch/PostgresBucketBatch.ts | Refactored to use a shared function for reading the op_id_sequence. |
| modules/module-postgres-storage/src/storage/PostgresSyncRulesStorage.ts | Updated condition to explicitly check for non-null last_flushed_op. |
| .changeset/sixty-melons-bow.md | Added changeset for patch, with a spelling issue in the commit title. |
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix!
This is a rare edge case, but can be consistently triggered in some development setups.
Requirements to reproduce:
where falsein the sync rules.The issue is in how op_id_sequence is initialized in postgres storage. We use
select last_value from op_id_sequenceto get the op_id for a checkpoint, andselect nextval('op_id_sequence')for each op_id in bucket_data.last_valuewill generally return the largest op_id value in bucket_data, andnextvalwill always generate a larger op_id. However, if we haven't generated any op_id yet, and generate the first checkpoint, then we get the case thatlast_valueis 1 (used for the first checkpoint). Then for the next checkpoint,nextvalgives 1, giving another checkpoint with the same op_id. Together with caching of checksums, this results in a op_id conflict that gives the "Checksum failed" loop.To see what's going on, run
curl http://localhost:8080/sync/stream -X POST -H "Authorization: Bearer <token>. You should see no data initially:Then add a record to sync. The above stream would not get a new checkpoint, while re-running will give this:
Note above:
last_op_idis still 1.This causes the checksums to fail on the client.
The issue specifically happens because the count and checksum for op_id 1 is cached and never refreshed. Even when more data is added, the op_id is incrementally updated, and still remains incorrect. The service recovers when it is restarted, clearing the cache.
However, the cache is not the core issue here - op_id 1 being re-used for two checkpoints is the issue.
The fix
One simple fix is to just call
select nextval('op_id_sequence')once as part of the migrations. This initializes the sequence, avoiding this issue completely. However, this also changes the op_id sequence for all our test fixtures. Rather than updating them all, and accounting for differences between mongodb and postgres storage, this fix just checksis_calledon the sequence, and defaults to 0 instead of 1 for that case.